AITopics | martingale property

Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLMReasoning

Neural Information Processing SystemsJun-22-2026, 18:02:35 GMT

Recent advances in reasoning techniques have substantially improved the performance of large language models (LLMs), raising expectations for their ability to provide accurate, truthful, and reliable information. However, emerging evidence suggests that iterative reasoning may foster belief entrenchment, rather than enhancing truth-seeking behavior. In this study, we propose a systematic evaluation framework for belief entrenchment in LLM reasoning by leveraging the Martingale property from Bayesian statistics. This property implies that, under rational belief updating, the expected value of future beliefs should remain equal to the current belief, i.e., belief updates cannot be predicted from solely the current belief. We propose the unsupervised, regression-based Martingale Score to measure violations of this property, signaling a deviation from the Bayesian ability of updating on new evidence. In open-ended problem domains, including event forecasting, value-laden questions, and academic paper review, we found such violations to be widespread across models, reasoning paradigms, problem domains, and system prompts, where the future beliefs are consistently predictable from the model's current belief, a phenomenon which we term belief entrenchment. Through comprehensive experiments, we identify the models (e.g., GPT-4o), reasoning techniques (e.g., chain of thought), and domains (e.g., forecasting) more prone to belief entrenchment. Finally, we validate the Martingale Score by showing that it predicts ground-truth accuracy on problem domains where ground truth labels are available. This indicates that, while designed as an unsupervised metric that operates even in domains without access to ground truth, the Martingale Score is a useful proxy of the truth-seeking ability of the LLM reasoning process.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

7f53f8c6c730af6aeb52e66eb74d8507-Paper.pdf

Neural Information Processing SystemsApr-26-2026, 14:40:54 GMT

data mining, machine learning, prediction, (18 more...)

Neural Information Processing Systems

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Add feedback

7f53f8c6c730af6aeb52e66eb74d8507-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 13:16:46 GMT

forecast, prediction, probability path, (13 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(2 more...)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Add feedback

Volatility Modeling via EWMA-Driven Time-Dependent Hurst Parameters

Athipatla, Jayanth

arXiv.org Artificial IntelligenceSep-9-2025

We introduce a novel rough Bergomi (rBergomi) model featuring a variance-driven exponentially weighted moving average (EWMA) time-dependent Hurst parameter $H_t$, fundamentally distinct from recent machine learning and wavelet-based approaches in the literature. Our framework pioneers a unified rough differential equation (RDE) formulation grounded in rough path theory, where the Hurst parameter dynamically adapts to evolving volatility regimes through a continuous EWMA mechanism tied to instantaneous variance. Unlike discrete model-switching or computationally intensive forecasting methods, our approach provides mathematical tractability while capturing volatility clustering and roughness bursts. We rigorously establish existence and uniqueness of solutions via rough path theory and derive martingale properties. Empirical validation on diverse asset classes including equities, cryptocurrencies, and commodities demonstrates superior performance in capturing dynamics and out-of-sample pricing accuracy. Our results show significant improvements over traditional constant-Hurst models.

artificial intelligence, assumption 2, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.0582

Genre: Research Report > New Finding (0.86)

Industry: Banking & Finance > Trading (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LLMs are Bayesian, in Expectation, not in Realization

Chlon, Leon, Rashidi, Sarah, Khamis, Zein, Awada, MarcAntonio M.

arXiv.org Machine LearningJul-17-2025

Large language models demonstrate remarkable in-context learning capabilities, adapting to new tasks without parameter updates. While this phenomenon has been successfully modeled as implicit Bayesian inference, recent empirical findings reveal a fundamental contradiction: transformers systematically violate the martingale property, a cornerstone requirement of Bayesian updating on exchangeable data. This violation challenges the theoretical foundations underlying uncertainty quantification in critical applications. Our theoretical analysis establishes four key results: (1) positional encodings induce martingale violations of order $Θ(\log n / n)$; (2) transformers achieve information-theoretic optimality with excess risk $O(n^{-1/2})$ in expectation over orderings; (3) the implicit posterior representation converges to the true Bayesian posterior in the space of sufficient statistics; and (4) we derive the optimal chain-of-thought length as $k^* = Θ(\sqrt{n}\log(1/\varepsilon))$ with explicit constants, providing a principled approach to reduce inference costs while maintaining performance. Empirical validation on GPT-3 confirms predictions (1)-(3), with transformers reaching 99\% of theoretical entropy limits within 20 examples. Our framework provides practical methods for extracting calibrated uncertainty estimates from position-aware architectures and optimizing computational efficiency in deployment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2507.11768

Genre: Research Report (1.00)

Add feedback

Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

Jangjoo, Fariba, Marsili, Matteo, Roudi, Yasser

arXiv.org Machine LearningJul-10-2025

Closed-loop learning is the process of repeatedly estimating a model from data generated from the model itself. It is receiving great attention due to the possibility that large neural network models may, in the future, be primarily trained with data generated by artificial neural networks themselves. We study this process for models that belong to exponential families, deriving equations of motions that govern the dynamics of the parameters. We show that maximum likelihood estimation of the parameters endows sufficient statistics with the martingale property and that as a result the process converges to absorbing states that amplify initial biases present in the data. However, we show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model, by relying on maximum a posteriori estimation or by introducing regularisation.

artificial intelligence, closed-loop learning, machine learning, (19 more...)

arXiv.org Machine Learning

2506.20623

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Nagler, Thomas, Rügamer, David

arXiv.org Machine LearningMay-21-2025

Prior-data fitted networks (PFNs) have emerged as promising foundation models for prediction from tabular data sets, achieving state-of-the-art performance on small to moderate data sizes without tuning. While PFNs are motivated by Bayesian ideas, they do not provide any uncertainty quantification for predictive means, quantiles, or similar quantities. We propose a principled and efficient sampling procedure to construct Bayesian posteriors for such estimates based on Martingale posteriors, and prove its convergence. Several simulated and real-world data examples showcase the uncertainty quantification of our method in inference applications.

large language model, machine learning, posterior, (17 more...)

arXiv.org Machine Learning

2505.11325

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Add feedback

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

Falck, Fabian, Wang, Ziyu, Holmes, Chris

arXiv.org Machine LearningJun-2-2024

In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypothesis from a new angle through the martingale property, a fundamental requirement of a Bayesian learning system for exchangeable data. We show that the martingale property is a necessary condition for unambiguous predictions in such scenarios, and enables a principled, decomposed notion of uncertainty vital in trustworthy, safety-critical systems. We derive actionable checks with corresponding theory and test statistics which must hold if the martingale property is satisfied. We also examine if uncertainty in LLMs decreases as expected in Bayesian learning when more data is observed. In three experiments, we provide evidence for violations of the martingale property, and deviations from a Bayesian scaling behaviour of uncertainty, falsifying the hypothesis that ICL is Bayesian.

bayesian, experiment, martingale property, (12 more...)

arXiv.org Machine Learning

2406.00793

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Add feedback

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

Lai, Chieh-Hsin, Takida, Yuhta, Uesaka, Toshimitsu, Murata, Naoki, Mitsufuji, Yuki, Ermon, Stefano

arXiv.org Artificial IntelligenceJun-1-2023

It refers to a (diffusion) model that is explicitly designed to align with The emergence of various notions of "consistency" the underlying trajectory defined by an ordinary differential in diffusion models has garnered considerable equation (ODE), stochastic differential equation (SDE), or attention and helped achieve improved sample partial differential equation (PDE). In this study, we aim quality, likelihood estimation, and accelerated to provide a theoretical investigation into the relationships sampling. Although similar concepts have between these three consistency-type models. Under certain been proposed in the literature, the precise relationships mild assumptions, we will rigorously establish the equivalence among them remain unclear. In this of these independently developed concepts.

artificial intelligence, machine learning, martingale, (14 more...)

arXiv.org Artificial Intelligence

2306.00367

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Probability Paths and the Structure of Predictions over Time

Lin, Zhiyuan, Sheng, Hao, Goel, Sharad

arXiv.org Machine LearningJun-11-2021

In settings ranging from weather forecasts to political prognostications to financial projections, probability estimates of future binary outcomes often evolve over time. For example, the estimated likelihood of rain on a specific day changes by the hour as new information becomes available. Given a collection of such probability paths, we introduce a Bayesian framework -- which we call the Gaussian latent information martingale, or GLIM -- for modeling the structure of dynamic predictions over time. Suppose, for example, that the likelihood of rain in a week is 50%, and consider two hypothetical scenarios. In the first, one expects the forecast is equally likely to become either 25% or 75% tomorrow; in the second, one expects the forecast to stay constant for the next several days. A time-sensitive decision-maker might select a course of action immediately in the latter scenario, but may postpone their decision in the former, knowing that new information is imminent. We model these trajectories by assuming predictions update according to a latent process of information flow, which is inferred from historical data. In contrast to general methods for time series analysis, this approach preserves the martingale structure of probability paths and better quantifies future uncertainties around probability paths. We show that GLIM outperforms three popular baseline methods, producing better estimated posterior probability path distributions measured by three different metrics. By elucidating the dynamic structure of predictions over time, we hope to help individuals make more informed choices.

forecast, prediction, probability path, (14 more...)

arXiv.org Machine Learning

2106.06515

Country:

Oceania > Australia (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Filters

Collaborating Authors

martingale property

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLMReasoning

7f53f8c6c730af6aeb52e66eb74d8507-Paper.pdf

7f53f8c6c730af6aeb52e66eb74d8507-Paper.pdf

Volatility Modeling via EWMA-Driven Time-Dependent Hurst Parameters

LLMs are Bayesian, in Expectation, not in Realization

Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning

Uncertainty Quantification for Prior-Data Fitted Networks using Martingale Posteriors

Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

Probability Paths and the Structure of Predictions over Time